Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

jayzhan211 · 2024-08-03T03:02:33Z

Which issue does this PR close?

Production ready PR

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.43ms │              0.48ms │  1.11x slower │
│ QQuery 1     │    40.66ms │             41.20ms │     no change │
│ QQuery 2     │    78.22ms │             76.61ms │     no change │
│ QQuery 3     │    74.18ms │             61.82ms │ +1.20x faster │
│ QQuery 4     │   449.60ms │            421.74ms │ +1.07x faster │
│ QQuery 5     │   706.87ms │            708.16ms │     no change │
│ QQuery 6     │    37.11ms │             37.35ms │     no change │
│ QQuery 7     │    41.30ms │             39.29ms │     no change │
│ QQuery 8     │   757.87ms │            673.52ms │ +1.13x faster │
│ QQuery 9     │   675.59ms │            678.64ms │     no change │
│ QQuery 10    │   204.05ms │            193.12ms │ +1.06x faster │
│ QQuery 11    │   233.20ms │            238.46ms │     no change │
│ QQuery 12    │   772.65ms │            761.90ms │     no change │
│ QQuery 13    │  1386.74ms │           1080.38ms │ +1.28x faster │
│ QQuery 14    │  1047.01ms │            758.85ms │ +1.38x faster │
│ QQuery 15    │   530.34ms │            501.92ms │ +1.06x faster │
│ QQuery 16    │  1770.70ms │           1337.31ms │ +1.32x faster │
│ QQuery 17    │  1753.15ms │           1301.97ms │ +1.35x faster │
│ QQuery 18    │  4425.26ms │           2690.27ms │ +1.64x faster │
│ QQuery 19    │    67.44ms │             57.48ms │ +1.17x faster │
│ QQuery 20    │  1687.14ms │           1577.46ms │ +1.07x faster │
│ QQuery 21    │  1945.38ms │           1822.41ms │ +1.07x faster │
│ QQuery 22    │  4323.40ms │           4102.97ms │ +1.05x faster │
│ QQuery 23    │  8903.54ms │           8579.50ms │     no change │
│ QQuery 24    │   493.88ms │            605.72ms │  1.23x slower │
│ QQuery 25    │   498.31ms │            505.79ms │     no change │
│ QQuery 26    │   566.23ms │            580.24ms │     no change │
│ QQuery 27    │  1394.61ms │           1554.94ms │  1.11x slower │
│ QQuery 28    │ 10598.40ms │          10953.84ms │     no change │
│ QQuery 29    │   419.42ms │            423.21ms │     no change │
│ QQuery 30    │   861.82ms │            765.23ms │ +1.13x faster │
│ QQuery 31    │   971.98ms │            819.20ms │ +1.19x faster │
│ QQuery 32    │  9351.65ms │           4740.51ms │ +1.97x faster │
│ QQuery 33    │  4043.72ms │           4451.66ms │  1.10x slower │
│ QQuery 34    │  3632.00ms │           4386.58ms │  1.21x slower │
│ QQuery 35    │  1095.49ms │            999.99ms │ +1.10x faster │
│ QQuery 36    │   144.76ms │            145.26ms │     no change │
│ QQuery 37    │   103.17ms │            104.42ms │     no change │
│ QQuery 38    │   106.97ms │            105.39ms │     no change │
│ QQuery 39    │   385.52ms │            275.47ms │ +1.40x faster │
│ QQuery 40    │    34.35ms │             32.81ms │     no change │
│ QQuery 41    │    32.84ms │             31.17ms │ +1.05x faster │
│ QQuery 42    │    40.06ms │             41.04ms │     no change │
└──────────────┴────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                  │ 66687.00ms │
│ Total Time (single-mode-groupby)   │ 59265.28ms │
│ Average Time (main)                │  1550.86ms │
│ Average Time (single-mode-groupby) │  1378.26ms │
│ Queries Faster                     │         20 │
│ Queries Slower                     │          5 │
│ Queries with No Change             │         18 │
└────────────────────────────────────┴────────────┘

This change (should only effect multi column group by query) has nothing to do with QQuery 24, it should be considered noise

TODO

This change only change partial/final to single mode. Repartition is not included inside the group by operator.
Next step is to find out whether including repartition inside group by is helpful or not

Signed-off-by: jayzhan211 <[email protected]>

jayzhan211 · 2024-08-03T03:46:04Z

datafusion/core/benches/multi_groupby.rs

+        let mut ctx = create_context(batches, Arc::clone(&schema)).unwrap();
+        b.iter(|| block_on(query(&mut ctx, "select a, b, count(*) from t group by a, b order by count(*) desc limit 10")))
+    });
+}


Gnuplot not found, using plotters backend benchmark high cardinality time: [273.00 ms 350.24 ms 451.67 ms] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild benchmark low cardinality time: [15.764 ms 16.531 ms 17.145 ms]

main branch

Gnuplot not found, using plotters backend Benchmarking benchmark high cardinality: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 11.0s. benchmark high cardinality time: [565.74 ms 964.40 ms 1.3864 s] change: [+69.461% +175.35% +332.28%] (p = 0.01 < 0.05) Performance has regressed. benchmark low cardinality time: [10.736 ms 11.140 ms 11.577 ms] change: [-35.106% -31.565% -28.118%] (p = 0.00 < 0.05) Performance has improved.

This PR has slightly regression for low cardinality but huge gain for high cardinality

Signed-off-by: jayzhan211 <[email protected]>

alamb · 2024-08-04T11:42:36Z

Thanks @jayzhan211 -- this looks quite interesting,. I will try and study this PR carefully early this week

alamb · 2024-08-05T16:15:44Z

I am running some benchmarks on this one

alamb · 2024-08-05T17:29:38Z

I also measured some non trivial improvements (16 core) along with some slowdowns

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.65ms │              0.66ms │     no change │
│ QQuery 1     │    70.34ms │             69.76ms │     no change │
│ QQuery 2     │   124.93ms │            125.33ms │     no change │
│ QQuery 3     │   131.01ms │            129.34ms │     no change │
│ QQuery 4     │   917.67ms │            971.24ms │  1.06x slower │
│ QQuery 5     │  1061.52ms │           1103.04ms │     no change │
│ QQuery 6     │    65.61ms │             67.14ms │     no change │
│ QQuery 7     │    73.85ms │             72.77ms │     no change │
│ QQuery 8     │  1422.06ms │           1199.52ms │ +1.19x faster │
│ QQuery 9     │  1281.64ms │           1347.37ms │  1.05x slower │
│ QQuery 10    │   446.75ms │            449.22ms │     no change │
│ QQuery 11    │   479.20ms │            512.66ms │  1.07x slower │
│ QQuery 12    │  1149.46ms │           1148.56ms │     no change │
│ QQuery 13    │  2360.71ms │           1915.35ms │ +1.23x faster │
│ QQuery 14    │  1549.49ms │           1217.84ms │ +1.27x faster │
│ QQuery 15    │  1054.82ms │           1090.16ms │     no change │
│ QQuery 16    │  2843.22ms │           2396.08ms │ +1.19x faster │
│ QQuery 17    │  2707.32ms │           2313.85ms │ +1.17x faster │
│ QQuery 18    │  5627.90ms │           4249.07ms │ +1.32x faster │
│ QQuery 19    │   119.94ms │            122.76ms │     no change │
│ QQuery 20    │  1614.43ms │           1635.75ms │     no change │
│ QQuery 21    │  1994.01ms │           2007.84ms │     no change │
│ QQuery 22    │  4810.56ms │           4824.01ms │     no change │
│ QQuery 23    │ 11246.33ms │          11301.58ms │     no change │
│ QQuery 24    │   754.54ms │            745.52ms │     no change │
│ QQuery 25    │   673.00ms │            663.95ms │     no change │
│ QQuery 26    │   825.60ms │            812.36ms │     no change │
│ QQuery 27    │  2487.83ms │           2442.42ms │     no change │
│ QQuery 28    │ 15504.67ms │          15553.00ms │     no change │
│ QQuery 29    │   565.35ms │            563.80ms │     no change │
│ QQuery 30    │  1286.72ms │           1109.49ms │ +1.16x faster │
│ QQuery 31    │  1608.83ms │           1251.64ms │ +1.29x faster │
│ QQuery 32    │  7555.94ms │           4070.29ms │ +1.86x faster │
│ QQuery 33    │  4877.63ms │           4743.06ms │     no change │
│ QQuery 34    │  4887.61ms │           4842.37ms │     no change │
│ QQuery 35    │  1763.00ms │           1464.69ms │ +1.20x faster │
│ QQuery 36    │   314.89ms │            322.15ms │     no change │
│ QQuery 37    │   219.26ms │            219.07ms │     no change │
│ QQuery 38    │   184.48ms │            189.16ms │     no change │
│ QQuery 39    │   990.12ms │            601.18ms │ +1.65x faster │
│ QQuery 40    │    85.76ms │             81.27ms │ +1.06x faster │
│ QQuery 41    │    79.70ms │             77.11ms │     no change │
│ QQuery 42    │    95.45ms │             93.21ms │     no change │
└──────────────┴────────────┴─────────────────────┴───────────────┘

It seems to have hurt TPCH a bit more:

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  104.42ms │            371.48ms │  3.56x slower │
│ QQuery 2     │   24.85ms │             25.11ms │     no change │
│ QQuery 3     │   39.93ms │             42.32ms │  1.06x slower │
│ QQuery 4     │   32.57ms │             32.83ms │     no change │
│ QQuery 5     │   62.08ms │             60.75ms │     no change │
│ QQuery 6     │    8.38ms │              8.65ms │     no change │
│ QQuery 7     │  115.87ms │            112.41ms │     no change │
│ QQuery 8     │   27.71ms │             27.06ms │     no change │
│ QQuery 9     │   62.80ms │             64.47ms │     no change │
│ QQuery 10    │   72.36ms │             69.71ms │     no change │
│ QQuery 11    │   63.73ms │             63.81ms │     no change │
│ QQuery 12    │   28.73ms │             28.54ms │     no change │
│ QQuery 13    │   39.70ms │             39.93ms │     no change │
│ QQuery 14    │   11.17ms │             11.14ms │     no change │
│ QQuery 15    │   20.03ms │             21.21ms │  1.06x slower │
│ QQuery 16    │   26.95ms │             23.04ms │ +1.17x faster │
│ QQuery 17    │   95.26ms │            102.77ms │  1.08x slower │
│ QQuery 18    │  226.12ms │            237.74ms │  1.05x slower │
│ QQuery 19    │   28.43ms │             29.93ms │  1.05x slower │
│ QQuery 20    │   45.35ms │             35.96ms │ +1.26x faster │
│ QQuery 21    │  172.54ms │            168.60ms │     no change │
│ QQuery 22    │   13.48ms │             14.03ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)             │ 1322.44ms │
│ Total Time (single-mode-groupby)   │ 1591.48ms │
│ Average Time (main_base)           │   60.11ms │
│ Average Time (single-mode-groupby) │   72.34ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         6 │
│ Queries with No Change             │        14 │
└────────────────────────────────────┴───────────┘

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  227.72ms │            330.23ms │  1.45x slower │
│ QQuery 2     │  125.12ms │            128.35ms │     no change │
│ QQuery 3     │  124.25ms │            125.15ms │     no change │
│ QQuery 4     │   95.52ms │             93.86ms │     no change │
│ QQuery 5     │  172.74ms │            174.37ms │     no change │
│ QQuery 6     │   58.73ms │             63.04ms │  1.07x slower │
│ QQuery 7     │  203.25ms │            217.22ms │  1.07x slower │
│ QQuery 8     │  158.45ms │            164.02ms │     no change │
│ QQuery 9     │  257.60ms │            258.15ms │     no change │
│ QQuery 10    │  230.00ms │            219.76ms │     no change │
│ QQuery 11    │   96.91ms │            101.00ms │     no change │
│ QQuery 12    │  133.99ms │            129.95ms │     no change │
│ QQuery 13    │  282.25ms │            288.57ms │     no change │
│ QQuery 14    │   87.25ms │             85.62ms │     no change │
│ QQuery 15    │  118.44ms │            149.96ms │  1.27x slower │
│ QQuery 16    │   84.88ms │             62.51ms │ +1.36x faster │
│ QQuery 17    │  221.45ms │            218.67ms │     no change │
│ QQuery 18    │  315.36ms │            324.40ms │     no change │
│ QQuery 19    │  159.36ms │            156.41ms │     no change │
│ QQuery 20    │  143.00ms │            122.93ms │ +1.16x faster │
│ QQuery 21    │  278.87ms │            291.86ms │     no change │
│ QQuery 22    │   67.32ms │             68.52ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)             │ 3642.44ms │
│ Total Time (single-mode-groupby)   │ 3774.56ms │
│ Average Time (main_base)           │  165.57ms │
│ Average Time (single-mode-groupby) │  171.57ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         4 │
│ Queries with No Change             │        16 │
└────────────────────────────────────┴───────────┘

Maybe we can do some profiling and figure out if there is some way to get back the performance on the queries it is slower for

alamb · 2024-08-08T11:20:45Z

Here are some more thoughts about why this approach works and an alternate idea of how we can make these queries all faster: #11680 (comment)

github-actions · 2024-10-08T01:59:37Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

jayzhan211 added 8 commits August 1, 2024 20:22

convert to single

54003ff

Signed-off-by: jayzhan211 <[email protected]>

queries

c427e75

Signed-off-by: jayzhan211 <[email protected]>

reuse-hash benchmark

1abd5e1

Signed-off-by: jayzhan211 <[email protected]>

low cardinality

5b5bf9d

Signed-off-by: jayzhan211 <[email protected]>

v2

f55b639

Signed-off-by: jayzhan211 <[email protected]>

result

40d6a0a

Signed-off-by: jayzhan211 <[email protected]>

production ready

6317a79

Signed-off-by: jayzhan211 <[email protected]>

production ready

c0e195f

Signed-off-by: jayzhan211 <[email protected]>

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 3, 2024

jayzhan211 added 4 commits August 3, 2024 11:04

rm test1

a203b57

Signed-off-by: jayzhan211 <[email protected]>

merge benchmark code

b7e77f7

Signed-off-by: jayzhan211 <[email protected]>

fix test

02c79ba

Signed-off-by: jayzhan211 <[email protected]>

clippy

142d6ed

Signed-off-by: jayzhan211 <[email protected]>

jayzhan211 changed the title ~~Single mode for multi column group by~~ Single mode for multi column group by -- Almost 2x for ClickBench Q32 Aug 3, 2024

jayzhan211 commented Aug 3, 2024

View reviewed changes

jayzhan211 added 4 commits August 3, 2024 19:19

Merge remote-tracking branch 'upstream/main' into single-mode-groupby

747e925

fix test

d1ca792

Signed-off-by: jayzhan211 <[email protected]>

Merge remote-tracking branch 'upstream/main' into single-mode-groupby

d7f3086

rm long running test

6fa6844

Signed-off-by: jayzhan211 <[email protected]>

jayzhan211 marked this pull request as ready for review August 4, 2024 02:14

alamb mentioned this pull request Aug 4, 2024

DataFusion weekly project plan (Andrew Lamb) - July 29, 2024 #11710

Closed

8 tasks

alamb mentioned this pull request Aug 5, 2024

DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 #11826

Closed

6 tasks

jayzhan211 marked this pull request as draft August 5, 2024 23:45

Rachelint mentioned this pull request Sep 4, 2024

Improve performance of high cardinality grouping by reusing hash values #11680

Open

github-actions bot added the Stale PR has not had any activity for some time label Oct 8, 2024

jayzhan211 closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

jayzhan211 commented Aug 3, 2024 •

edited

Loading

jayzhan211 Aug 3, 2024

jayzhan211 Aug 3, 2024 •

edited

Loading

alamb commented Aug 4, 2024

alamb commented Aug 5, 2024

alamb commented Aug 5, 2024

alamb commented Aug 8, 2024

github-actions bot commented Oct 8, 2024

Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

Conversation

jayzhan211 commented Aug 3, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

TODO

jayzhan211 Aug 3, 2024

Choose a reason for hiding this comment

jayzhan211 Aug 3, 2024 • edited Loading

Choose a reason for hiding this comment

alamb commented Aug 4, 2024

alamb commented Aug 5, 2024

alamb commented Aug 5, 2024

alamb commented Aug 8, 2024

github-actions bot commented Oct 8, 2024

jayzhan211 commented Aug 3, 2024 •

edited

Loading

jayzhan211 Aug 3, 2024 •

edited

Loading